Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 936 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 62.2 KiB |
| Average record size in memory | 68.1 B |
Variable types
| Numeric | 7 |
|---|---|
| Categorical | 3 |
title has a high cardinality: 935 distinct values | High cardinality |
genre has a high cardinality: 200 distinct values | High cardinality |
director has a high cardinality: 607 distinct values | High cardinality |
rank is uniformly distributed | Uniform |
title is uniformly distributed | Uniform |
director is uniformly distributed | Uniform |
rank has unique values | Unique |
Reproduction
| Analysis started | 2021-01-30 16:47:55.542028 |
|---|---|
| Analysis finished | 2021-01-30 16:48:12.843314 |
| Duration | 17.3 seconds |
| Software version | pandas-profiling v2.10.0 |
| Download configuration | config.yaml |
| Distinct | 936 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 498.1858974 |
|---|---|
| Minimum | 1 |
| Maximum | 1000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 53.75 |
| Q1 | 246.75 |
| median | 496.5 |
| Q3 | 746.25 |
| 95-th percentile | 946.5 |
| Maximum | 1000 |
| Range | 999 |
| Interquartile range (IQR) | 499.5 |
Descriptive statistics
| Standard deviation | 288.1005611 |
|---|---|
| Coefficient of variation (CV) | 0.5782993108 |
| Kurtosis | -1.202928275 |
| Mean | 498.1858974 |
| Median Absolute Deviation (MAD) | 250 |
| Skewness | 0.009336325751 |
| Sum | 466302 |
| Variance | 83001.93332 |
| Monotocity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1000 | 1 | 0.1% |
| 327 | 1 | 0.1% |
| 340 | 1 | 0.1% |
| 339 | 1 | 0.1% |
| 338 | 1 | 0.1% |
| 337 | 1 | 0.1% |
| 335 | 1 | 0.1% |
| 334 | 1 | 0.1% |
| 333 | 1 | 0.1% |
| 332 | 1 | 0.1% |
| Other values (926) | 926 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 |
| Value | Count | Frequency (%) |
| 1000 | 1 | |
| 999 | 1 | |
| 998 | 1 | |
| 997 | 1 | |
| 996 | 1 |
| Distinct | 935 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.7 KiB |
| The Host | 2 |
|---|---|
| The Invitation | 1 |
| Divergent | 1 |
| Big Hero 6 | 1 |
| Relatos salvajes | 1 |
| Other values (930) |
Length
| Max length | 61 |
|---|---|
| Median length | 13 |
| Mean length | 14.65384615 |
| Min length | 2 |
Characters and Unicode
| Total characters | 13716 |
|---|---|
| Distinct characters | 79 |
| Distinct categories | 8 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 934 ? |
|---|---|
| Unique (%) | 99.8% |
Sample
| 1st row | Guardians of the Galaxy |
|---|---|
| 2nd row | Prometheus |
| 3rd row | Split |
| 4th row | Sing |
| 5th row | Suicide Squad |
| Value | Count | Frequency (%) |
| The Host | 2 | 0.2% |
| The Invitation | 1 | 0.1% |
| Divergent | 1 | 0.1% |
| Big Hero 6 | 1 | 0.1% |
| Relatos salvajes | 1 | 0.1% |
| Source Code | 1 | 0.1% |
| Disaster Movie | 1 | 0.1% |
| Steve Jobs | 1 | 0.1% |
| It Follows | 1 | 0.1% |
| Superbad | 1 | 0.1% |
| Other values (925) | 925 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| the | 288 | 11.7% |
| of | 91 | 3.7% |
| a | 29 | 1.2% |
| and | 22 | 0.9% |
| 2 | 22 | 0.9% |
| in | 19 | 0.8% |
| 15 | 0.6% | |
| to | 12 | 0.5% |
| man | 11 | 0.4% |
| girl | 10 | 0.4% |
| Other values (1348) | 1940 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1523 | 11.1% | |
| e | 1419 | 10.3% |
| a | 830 | 6.1% |
| o | 811 | 5.9% |
| n | 767 | 5.6% |
| r | 761 | 5.5% |
| i | 727 | 5.3% |
| t | 681 | 5.0% |
| s | 576 | 4.2% |
| h | 510 | 3.7% |
| Other values (69) | 5111 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 9759 | |
| Uppercase Letter | 2137 | 15.6% |
| Space Separator | 1523 | 11.1% |
| Other Punctuation | 159 | 1.2% |
| Decimal Number | 101 | 0.7% |
| Dash Punctuation | 29 | 0.2% |
| Open Punctuation | 4 | < 0.1% |
| Close Punctuation | 4 | < 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 1419 | |
| a | 830 | 8.5% |
| o | 811 | 8.3% |
| n | 767 | 7.9% |
| r | 761 | 7.8% |
| i | 727 | 7.4% |
| t | 681 | 7.0% |
| s | 576 | 5.9% |
| h | 510 | 5.2% |
| l | 431 | 4.4% |
| Other values (21) | 2246 |
| Value | Count | Frequency (%) |
| T | 329 | |
| S | 179 | 8.4% |
| M | 137 | 6.4% |
| B | 121 | 5.7% |
| D | 116 | 5.4% |
| A | 106 | 5.0% |
| P | 103 | 4.8% |
| H | 101 | 4.7% |
| C | 100 | 4.7% |
| L | 91 | 4.3% |
| Other values (16) | 754 |
| Value | Count | Frequency (%) |
| 2 | 34 | |
| 3 | 17 | |
| 1 | 14 | |
| 0 | 13 | 12.9% |
| 4 | 6 | 5.9% |
| 5 | 5 | 5.0% |
| 6 | 3 | 3.0% |
| 9 | 3 | 3.0% |
| 8 | 3 | 3.0% |
| 7 | 3 | 3.0% |
| Value | Count | Frequency (%) |
| : | 80 | |
| ' | 35 | |
| . | 21 | 13.2% |
| , | 9 | 5.7% |
| & | 6 | 3.8% |
| ! | 4 | 2.5% |
| ? | 2 | 1.3% |
| / | 2 | 1.3% |
| Value | Count | Frequency (%) |
| 1523 |
| Value | Count | Frequency (%) |
| - | 29 |
| Value | Count | Frequency (%) |
| ( | 4 |
| Value | Count | Frequency (%) |
| ) | 4 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 11896 | |
| Common | 1820 | 13.3% |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 1419 | 11.9% |
| a | 830 | 7.0% |
| o | 811 | 6.8% |
| n | 767 | 6.4% |
| r | 761 | 6.4% |
| i | 727 | 6.1% |
| t | 681 | 5.7% |
| s | 576 | 4.8% |
| h | 510 | 4.3% |
| l | 431 | 3.6% |
| Other values (47) | 4383 |
| Value | Count | Frequency (%) |
| 1523 | ||
| : | 80 | 4.4% |
| ' | 35 | 1.9% |
| 2 | 34 | 1.9% |
| - | 29 | 1.6% |
| . | 21 | 1.2% |
| 3 | 17 | 0.9% |
| 1 | 14 | 0.8% |
| 0 | 13 | 0.7% |
| , | 9 | 0.5% |
| Other values (12) | 45 | 2.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13709 | |
| None | 7 | 0.1% |
Most frequent character per block
| Value | Count | Frequency (%) |
| 1523 | 11.1% | |
| e | 1419 | 10.4% |
| a | 830 | 6.1% |
| o | 811 | 5.9% |
| n | 767 | 5.6% |
| r | 761 | 5.6% |
| i | 727 | 5.3% |
| t | 681 | 5.0% |
| s | 576 | 4.2% |
| h | 510 | 3.7% |
| Other values (64) | 5104 |
| Value | Count | Frequency (%) |
| é | 3 | |
| è | 1 | 14.3% |
| ä | 1 | 14.3% |
| í | 1 | 14.3% |
| á | 1 | 14.3% |
| Distinct | 200 |
|---|---|
| Distinct (%) | 21.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.7 KiB |
| Action,Adventure,Sci-Fi | 50 |
|---|---|
| Drama | 43 |
| Comedy,Drama,Romance | 32 |
| Comedy | 30 |
| Drama,Romance | 28 |
| Other values (195) |
Length
| Max length | 26 |
|---|---|
| Median length | 20 |
| Mean length | 18.20512821 |
| Min length | 5 |
Characters and Unicode
| Total characters | 17040 |
|---|---|
| Distinct characters | 31 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 83 ? |
|---|---|
| Unique (%) | 8.9% |
Sample
| 1st row | Action,Adventure,Sci-Fi |
|---|---|
| 2nd row | Adventure,Mystery,Sci-Fi |
| 3rd row | Horror,Thriller |
| 4th row | Animation,Comedy,Family |
| 5th row | Action,Adventure,Fantasy |
| Value | Count | Frequency (%) |
| Action,Adventure,Sci-Fi | 50 | 5.3% |
| Drama | 43 | 4.6% |
| Comedy,Drama,Romance | 32 | 3.4% |
| Comedy | 30 | 3.2% |
| Drama,Romance | 28 | 3.0% |
| Action,Adventure,Fantasy | 26 | 2.8% |
| Animation,Adventure,Comedy | 26 | 2.8% |
| Comedy,Drama | 25 | 2.7% |
| Comedy,Romance | 25 | 2.7% |
| Crime,Drama,Thriller | 22 | 2.4% |
| Other values (190) | 629 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| action,adventure,sci-fi | 50 | 5.3% |
| drama | 43 | 4.6% |
| comedy,drama,romance | 32 | 3.4% |
| comedy | 30 | 3.2% |
| drama,romance | 28 | 3.0% |
| animation,adventure,comedy | 26 | 2.8% |
| action,adventure,fantasy | 26 | 2.8% |
| comedy,romance | 25 | 2.7% |
| comedy,drama | 25 | 2.7% |
| crime,drama,mystery | 22 | 2.4% |
| Other values (190) | 629 |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 1786 | 10.5% |
| , | 1468 | 8.6% |
| a | 1459 | 8.6% |
| e | 1332 | 7.8% |
| m | 1110 | 6.5% |
| i | 1103 | 6.5% |
| o | 1065 | 6.2% |
| n | 865 | 5.1% |
| t | 831 | 4.9% |
| y | 713 | 4.2% |
| Other values (21) | 5308 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 12940 | |
| Uppercase Letter | 2518 | 14.8% |
| Other Punctuation | 1468 | 8.6% |
| Dash Punctuation | 114 | 0.7% |
Most frequent character per category
| Value | Count | Frequency (%) |
| r | 1786 | |
| a | 1459 | |
| e | 1332 | |
| m | 1110 | |
| i | 1103 | |
| o | 1065 | |
| n | 865 | |
| t | 831 | 6.4% |
| y | 713 | 5.5% |
| c | 555 | 4.3% |
| Other values (8) | 2121 |
| Value | Count | Frequency (%) |
| A | 584 | |
| D | 474 | |
| C | 409 | |
| F | 262 | |
| T | 183 | 7.3% |
| H | 136 | 5.4% |
| R | 131 | 5.2% |
| S | 130 | 5.2% |
| M | 120 | 4.8% |
| B | 71 | 2.8% |
| Value | Count | Frequency (%) |
| , | 1468 |
| Value | Count | Frequency (%) |
| - | 114 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 15458 | |
| Common | 1582 | 9.3% |
Most frequent character per script
| Value | Count | Frequency (%) |
| r | 1786 | 11.6% |
| a | 1459 | 9.4% |
| e | 1332 | 8.6% |
| m | 1110 | 7.2% |
| i | 1103 | 7.1% |
| o | 1065 | 6.9% |
| n | 865 | 5.6% |
| t | 831 | 5.4% |
| y | 713 | 4.6% |
| A | 584 | 3.8% |
| Other values (19) | 4610 |
| Value | Count | Frequency (%) |
| , | 1468 | |
| - | 114 | 7.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 17040 |
Most frequent character per block
| Value | Count | Frequency (%) |
| r | 1786 | 10.5% |
| , | 1468 | 8.6% |
| a | 1459 | 8.6% |
| e | 1332 | 7.8% |
| m | 1110 | 6.5% |
| i | 1103 | 6.5% |
| o | 1065 | 6.2% |
| n | 865 | 5.1% |
| t | 831 | 4.9% |
| y | 713 | 4.2% |
| Other values (21) | 5308 |
| Distinct | 607 |
|---|---|
| Distinct (%) | 64.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.7 KiB |
| Ridley Scott | 8 |
|---|---|
| Paul W.S. Anderson | 6 |
| Michael Bay | 6 |
| David Yates | 6 |
| M. Night Shyamalan | 6 |
| Other values (602) |
Length
| Max length | 32 |
|---|---|
| Median length | 13 |
| Mean length | 13.13782051 |
| Min length | 3 |
Characters and Unicode
| Total characters | 12297 |
|---|---|
| Distinct characters | 69 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 418 ? |
|---|---|
| Unique (%) | 44.7% |
Sample
| 1st row | James Gunn |
|---|---|
| 2nd row | Ridley Scott |
| 3rd row | M. Night Shyamalan |
| 4th row | Christophe Lourdelet |
| 5th row | David Ayer |
| Value | Count | Frequency (%) |
| Ridley Scott | 8 | 0.9% |
| Paul W.S. Anderson | 6 | 0.6% |
| Michael Bay | 6 | 0.6% |
| David Yates | 6 | 0.6% |
| M. Night Shyamalan | 6 | 0.6% |
| Antoine Fuqua | 5 | 0.5% |
| Justin Lin | 5 | 0.5% |
| Danny Boyle | 5 | 0.5% |
| Denis Villeneuve | 5 | 0.5% |
| J.J. Abrams | 5 | 0.5% |
| Other values (597) | 879 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| david | 35 | 1.8% |
| john | 23 | 1.2% |
| james | 20 | 1.0% |
| scott | 18 | 0.9% |
| michael | 18 | 0.9% |
| paul | 18 | 0.9% |
| steven | 13 | 0.7% |
| robert | 12 | 0.6% |
| ben | 12 | 0.6% |
| lee | 11 | 0.6% |
| Other values (920) | 1778 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 1156 | 9.4% |
| 1022 | 8.3% | |
| a | 980 | 8.0% |
| n | 876 | 7.1% |
| r | 829 | 6.7% |
| o | 733 | 6.0% |
| i | 685 | 5.6% |
| l | 569 | 4.6% |
| t | 451 | 3.7% |
| s | 441 | 3.6% |
| Other values (59) | 4555 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 9176 | |
| Uppercase Letter | 2013 | 16.4% |
| Space Separator | 1022 | 8.3% |
| Other Punctuation | 67 | 0.5% |
| Dash Punctuation | 19 | 0.2% |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 1156 | |
| a | 980 | |
| n | 876 | |
| r | 829 | 9.0% |
| o | 733 | 8.0% |
| i | 685 | 7.5% |
| l | 569 | 6.2% |
| t | 451 | 4.9% |
| s | 441 | 4.8% |
| h | 336 | 3.7% |
| Other values (28) | 2120 |
| Value | Count | Frequency (%) |
| S | 196 | 9.7% |
| J | 186 | 9.2% |
| M | 173 | 8.6% |
| A | 141 | 7.0% |
| D | 125 | 6.2% |
| B | 119 | 5.9% |
| C | 117 | 5.8% |
| G | 116 | 5.8% |
| R | 109 | 5.4% |
| L | 99 | 4.9% |
| Other values (17) | 632 |
| Value | Count | Frequency (%) |
| . | 65 | |
| ' | 2 | 3.0% |
| Value | Count | Frequency (%) |
| 1022 |
| Value | Count | Frequency (%) |
| - | 19 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 11189 | |
| Common | 1108 | 9.0% |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 1156 | 10.3% |
| a | 980 | 8.8% |
| n | 876 | 7.8% |
| r | 829 | 7.4% |
| o | 733 | 6.6% |
| i | 685 | 6.1% |
| l | 569 | 5.1% |
| t | 451 | 4.0% |
| s | 441 | 3.9% |
| h | 336 | 3.0% |
| Other values (55) | 4133 |
| Value | Count | Frequency (%) |
| 1022 | ||
| . | 65 | 5.9% |
| - | 19 | 1.7% |
| ' | 2 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12254 | |
| None | 43 | 0.3% |
Most frequent character per block
| Value | Count | Frequency (%) |
| e | 1156 | 9.4% |
| 1022 | 8.3% | |
| a | 980 | 8.0% |
| n | 876 | 7.1% |
| r | 829 | 6.8% |
| o | 733 | 6.0% |
| i | 685 | 5.6% |
| l | 569 | 4.6% |
| t | 451 | 3.7% |
| s | 441 | 3.6% |
| Other values (46) | 4512 |
| Value | Count | Frequency (%) |
| é | 10 | |
| á | 9 | |
| ó | 4 | 9.3% |
| ö | 4 | 9.3% |
| å | 4 | 9.3% |
| ñ | 3 | 7.0% |
| ç | 3 | 7.0% |
| Ø | 1 | 2.3% |
| í | 1 | 2.3% |
| ë | 1 | 2.3% |
| Other values (3) | 3 | 7.0% |
year
Real number (ℝ≥0)
| Distinct | 11 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2012.771368 |
|---|---|
| Minimum | 2006 |
| Maximum | 2016 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 2006 |
|---|---|
| 5-th percentile | 2007 |
| Q1 | 2010 |
| median | 2014 |
| Q3 | 2016 |
| 95-th percentile | 2016 |
| Maximum | 2016 |
| Range | 10 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.178987268 |
|---|---|
| Coefficient of variation (CV) | 0.001579408034 |
| Kurtosis | -0.8070367081 |
| Mean | 2012.771368 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.6863119763 |
| Sum | 1883954 |
| Variance | 10.10596005 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=11)
| Value | Count | Frequency (%) |
| 2016 | 268 | |
| 2015 | 123 | |
| 2014 | 95 | 10.1% |
| 2013 | 86 | 9.2% |
| 2012 | 62 | 6.6% |
| 2010 | 59 | 6.3% |
| 2011 | 58 | 6.2% |
| 2009 | 49 | 5.2% |
| 2008 | 49 | 5.2% |
| 2007 | 46 | 4.9% |
| Value | Count | Frequency (%) |
| 2006 | 41 | |
| 2007 | 46 | |
| 2008 | 49 | |
| 2009 | 49 | |
| 2010 | 59 |
| Value | Count | Frequency (%) |
| 2016 | 268 | |
| 2015 | 123 | |
| 2014 | 95 | 10.1% |
| 2013 | 86 | 9.2% |
| 2012 | 62 | 6.6% |
runtime
Real number (ℝ≥0)
| Distinct | 92 |
|---|---|
| Distinct (%) | 9.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 113.2724359 |
|---|---|
| Minimum | 66 |
| Maximum | 187 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 66 |
|---|---|
| 5-th percentile | 88 |
| Q1 | 100 |
| median | 111 |
| Q3 | 123 |
| 95-th percentile | 149 |
| Maximum | 187 |
| Range | 121 |
| Interquartile range (IQR) | 23 |
Descriptive statistics
| Standard deviation | 18.55079827 |
|---|---|
| Coefficient of variation (CV) | 0.1637715135 |
| Kurtosis | 0.6336593054 |
| Mean | 113.2724359 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 0.7911194262 |
| Sum | 106023 |
| Variance | 344.1321164 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 108 | 29 | 3.1% |
| 117 | 26 | 2.8% |
| 100 | 26 | 2.8% |
| 110 | 25 | 2.7% |
| 118 | 25 | 2.7% |
| 102 | 25 | 2.7% |
| 106 | 24 | 2.6% |
| 104 | 22 | 2.4% |
| 112 | 22 | 2.4% |
| 101 | 21 | 2.2% |
| Other values (82) | 691 |
| Value | Count | Frequency (%) |
| 66 | 1 | 0.1% |
| 73 | 1 | 0.1% |
| 80 | 2 | |
| 81 | 4 | |
| 82 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 187 | 1 | 0.1% |
| 180 | 2 | |
| 172 | 1 | 0.1% |
| 170 | 1 | 0.1% |
| 169 | 3 |
rating
Real number (ℝ≥0)
| Distinct | 55 |
|---|---|
| Distinct (%) | 5.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.729166667 |
|---|---|
| Minimum | 1.9 |
| Maximum | 9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 1.9 |
|---|---|
| 5-th percentile | 5.175 |
| Q1 | 6.2 |
| median | 6.8 |
| Q3 | 7.4 |
| 95-th percentile | 8.1 |
| Maximum | 9 |
| Range | 7.1 |
| Interquartile range (IQR) | 1.2 |
Descriptive statistics
| Standard deviation | 0.9352249579 |
|---|---|
| Coefficient of variation (CV) | 0.1389807987 |
| Kurtosis | 1.190310556 |
| Mean | 6.729166667 |
| Median Absolute Deviation (MAD) | 0.6 |
| Skewness | -0.7045209798 |
| Sum | 6298.5 |
| Variance | 0.8746457219 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6.7 | 47 | 5.0% |
| 7 | 44 | 4.7% |
| 7.1 | 44 | 4.7% |
| 6.3 | 41 | 4.4% |
| 7.3 | 40 | 4.3% |
| 7.8 | 39 | 4.2% |
| 6.6 | 39 | 4.2% |
| 7.2 | 39 | 4.2% |
| 6.5 | 37 | 4.0% |
| 6.2 | 36 | 3.8% |
| Other values (45) | 530 |
| Value | Count | Frequency (%) |
| 1.9 | 1 | |
| 2.7 | 1 | |
| 3.2 | 1 | |
| 3.5 | 2 | |
| 3.7 | 1 |
| Value | Count | Frequency (%) |
| 9 | 1 | 0.1% |
| 8.8 | 1 | 0.1% |
| 8.6 | 3 | |
| 8.5 | 6 | |
| 8.4 | 3 |
votes
Real number (ℝ≥0)
| Distinct | 933 |
|---|---|
| Distinct (%) | 99.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 175270.2169 |
|---|---|
| Minimum | 61 |
| Maximum | 1791916 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 61 |
|---|---|
| 5-th percentile | 1586.25 |
| Q1 | 41593 |
| median | 114918.5 |
| Q3 | 249538 |
| 95-th percentile | 530938.75 |
| Maximum | 1791916 |
| Range | 1791855 |
| Interquartile range (IQR) | 207945 |
Descriptive statistics
| Standard deviation | 190582.4207 |
|---|---|
| Coefficient of variation (CV) | 1.08736341 |
| Kurtosis | 11.27174861 |
| Mean | 175270.2169 |
| Median Absolute Deviation (MAD) | 88404 |
| Skewness | 2.493379996 |
| Sum | 164052923 |
| Variance | 3.632165907 × 1010 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 291 | 2 | 0.2% |
| 97141 | 2 | 0.2% |
| 1427 | 2 | 0.2% |
| 125693 | 1 | 0.1% |
| 406219 | 1 | 0.1% |
| 299718 | 1 | 0.1% |
| 461509 | 1 | 0.1% |
| 92868 | 1 | 0.1% |
| 240323 | 1 | 0.1% |
| 101058 | 1 | 0.1% |
| Other values (923) | 923 |
| Value | Count | Frequency (%) |
| 61 | 1 | |
| 102 | 1 | |
| 115 | 1 | |
| 164 | 1 | |
| 173 | 1 |
| Value | Count | Frequency (%) |
| 1791916 | 1 | |
| 1583625 | 1 | |
| 1222645 | 1 | |
| 1047747 | 1 | |
| 1045588 | 1 |
revenue
Real number (ℝ≥0)
| Distinct | 790 |
|---|---|
| Distinct (%) | 84.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 80.75192308 |
|---|---|
| Minimum | 0 |
| Maximum | 936.63 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.32 |
| Q1 | 17.4425 |
| median | 48.15 |
| Q3 | 102.4225 |
| 95-th percentile | 292.075 |
| Maximum | 936.63 |
| Range | 936.63 |
| Interquartile range (IQR) | 84.98 |
Descriptive statistics
| Standard deviation | 99.51826197 |
|---|---|
| Coefficient of variation (CV) | 1.232394947 |
| Kurtosis | 11.9166468 |
| Mean | 80.75192308 |
| Median Absolute Deviation (MAD) | 37.275 |
| Skewness | 2.764949299 |
| Sum | 75583.8 |
| Variance | 9903.884465 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 48.15 | 98 | 10.5% |
| 0.03 | 5 | 0.5% |
| 0.04 | 4 | 0.4% |
| 0.01 | 4 | 0.4% |
| 0.05 | 4 | 0.4% |
| 0.32 | 4 | 0.4% |
| 0.02 | 4 | 0.4% |
| 2.2 | 3 | 0.3% |
| 0.54 | 3 | 0.3% |
| 0.15 | 3 | 0.3% |
| Other values (780) | 804 |
| Value | Count | Frequency (%) |
| 0 | 1 | 0.1% |
| 0.01 | 4 | |
| 0.02 | 4 | |
| 0.03 | 5 | |
| 0.04 | 4 |
| Value | Count | Frequency (%) |
| 936.63 | 1 | |
| 760.51 | 1 | |
| 652.18 | 1 | |
| 623.28 | 1 | |
| 533.32 | 1 |
metascore
Real number (ℝ≥0)
| Distinct | 84 |
|---|---|
| Distinct (%) | 9.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 58.98504274 |
|---|---|
| Minimum | 11 |
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 11 |
|---|---|
| 5-th percentile | 31 |
| Q1 | 47 |
| median | 59.5 |
| Q3 | 72 |
| 95-th percentile | 85 |
| Maximum | 100 |
| Range | 89 |
| Interquartile range (IQR) | 25 |
Descriptive statistics
| Standard deviation | 17.19475702 |
|---|---|
| Coefficient of variation (CV) | 0.2915104614 |
| Kurtosis | -0.6122051468 |
| Mean | 58.98504274 |
| Median Absolute Deviation (MAD) | 12.5 |
| Skewness | -0.1238873467 |
| Sum | 55210 |
| Variance | 295.6596691 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 66 | 25 | 2.7% |
| 72 | 25 | 2.7% |
| 68 | 25 | 2.7% |
| 64 | 24 | 2.6% |
| 57 | 23 | 2.5% |
| 51 | 22 | 2.4% |
| 65 | 22 | 2.4% |
| 48 | 21 | 2.2% |
| 81 | 21 | 2.2% |
| 76 | 21 | 2.2% |
| Other values (74) | 707 |
| Value | Count | Frequency (%) |
| 11 | 1 | 0.1% |
| 15 | 1 | 0.1% |
| 16 | 1 | 0.1% |
| 18 | 4 | |
| 19 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 100 | 1 | 0.1% |
| 99 | 1 | 0.1% |
| 98 | 1 | 0.1% |
| 96 | 4 | |
| 95 | 3 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| rank | title | genre | director | year | runtime | rating | votes | revenue | metascore | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Guardians of the Galaxy | Action,Adventure,Sci-Fi | James Gunn | 2014 | 121 | 8.1 | 757074 | 333.13 | 76.0 |
| 1 | 2 | Prometheus | Adventure,Mystery,Sci-Fi | Ridley Scott | 2012 | 124 | 7.0 | 485820 | 126.46 | 65.0 |
| 2 | 3 | Split | Horror,Thriller | M. Night Shyamalan | 2016 | 117 | 7.3 | 157606 | 138.12 | 62.0 |
| 3 | 4 | Sing | Animation,Comedy,Family | Christophe Lourdelet | 2016 | 108 | 7.2 | 60545 | 270.32 | 59.0 |
| 4 | 5 | Suicide Squad | Action,Adventure,Fantasy | David Ayer | 2016 | 123 | 6.2 | 393727 | 325.02 | 40.0 |
| 5 | 6 | The Great Wall | Action,Adventure,Fantasy | Yimou Zhang | 2016 | 103 | 6.1 | 56036 | 45.13 | 42.0 |
| 6 | 7 | La La Land | Comedy,Drama,Music | Damien Chazelle | 2016 | 128 | 8.3 | 258682 | 151.06 | 93.0 |
| 7 | 8 | Mindhorn | Comedy | Sean Foley | 2016 | 89 | 6.4 | 2490 | 48.15 | 71.0 |
| 8 | 9 | The Lost City of Z | Action,Adventure,Biography | James Gray | 2016 | 141 | 7.1 | 7188 | 8.01 | 78.0 |
| 9 | 10 | Passengers | Adventure,Drama,Romance | Morten Tyldum | 2016 | 116 | 7.0 | 192177 | 100.01 | 41.0 |
Last rows
| rank | title | genre | director | year | runtime | rating | votes | revenue | metascore | |
|---|---|---|---|---|---|---|---|---|---|---|
| 926 | 989 | Martyrs | Horror | Pascal Laugier | 2008 | 99 | 7.1 | 63785 | 48.15 | 89.0 |
| 927 | 991 | Underworld: Rise of the Lycans | Action,Adventure,Fantasy | Patrick Tatopoulos | 2009 | 92 | 6.6 | 129708 | 45.80 | 44.0 |
| 928 | 992 | Taare Zameen Par | Drama,Family,Music | Aamir Khan | 2007 | 165 | 8.5 | 102697 | 1.20 | 42.0 |
| 929 | 994 | Resident Evil: Afterlife | Action,Adventure,Horror | Paul W.S. Anderson | 2010 | 97 | 5.9 | 140900 | 60.13 | 37.0 |
| 930 | 995 | Project X | Comedy | Nima Nourizadeh | 2012 | 88 | 6.7 | 164088 | 54.72 | 48.0 |
| 931 | 996 | Secret in Their Eyes | Crime,Drama,Mystery | Billy Ray | 2015 | 111 | 6.2 | 27585 | 48.15 | 45.0 |
| 932 | 997 | Hostel: Part II | Horror | Eli Roth | 2007 | 94 | 5.5 | 73152 | 17.54 | 46.0 |
| 933 | 998 | Step Up 2: The Streets | Drama,Music,Romance | Jon M. Chu | 2008 | 98 | 6.2 | 70699 | 58.01 | 50.0 |
| 934 | 999 | Search Party | Adventure,Comedy | Scot Armstrong | 2014 | 93 | 5.6 | 4881 | 48.15 | 22.0 |
| 935 | 1000 | Nine Lives | Comedy,Family,Fantasy | Barry Sonnenfeld | 2016 | 87 | 5.3 | 12435 | 19.64 | 11.0 |